Skip to content

SSE_C changes#281

Merged
rajdchak merged 10 commits intoawslabs:mainfrom
rajdchak:sse-c
Jun 11, 2025
Merged

SSE_C changes#281
rajdchak merged 10 commits intoawslabs:mainfrom
rajdchak:sse-c

Conversation

@rajdchak
Copy link
Contributor

@rajdchak rajdchak commented Jun 3, 2025

Description of change

Support passing of sse_c customer key to pass this to s3 for encryption/decryption.

S3A currently has this customer key as Optional String https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/auth/delegation/EncryptionSecretOperations.java#L41 while Iceberg has this key as a String https://github.com/apache/iceberg/blob/f9cc62eb0d98e360b452a3ab8fdc6efdc4969f6e/aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java#L499. So decided to accept this key as Optional String.

Relevant issues

Does this contribution introduce any breaking changes to the existing APIs or behaviors?

Does this contribution introduce any new public APIs or behaviors?

How was the contribution tested?

Does this contribution need a changelog entry?

  • I have updated the CHANGELOG or README if appropriate

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

@rajdchak rajdchak had a problem deploying to integration-tests June 3, 2025 16:21 — with GitHub Actions Failure
@rajdchak rajdchak had a problem deploying to integration-tests June 5, 2025 14:24 — with GitHub Actions Failure
@rajdchak rajdchak had a problem deploying to integration-tests June 5, 2025 14:35 — with GitHub Actions Failure
@rajdchak rajdchak changed the title SSE_C initial changes SSE_C changes Jun 5, 2025
@rajdchak rajdchak marked this pull request as ready for review June 5, 2025 14:36
@rajdchak rajdchak had a problem deploying to integration-tests June 5, 2025 14:47 — with GitHub Actions Failure
Copy link
Collaborator

@fuatbasik fuatbasik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rajdchak . My major comment is around dependencies. We should not depend sdk.core for this change. We also should not add any S3 dependency to common.

@rajdchak rajdchak had a problem deploying to integration-tests June 5, 2025 16:53 — with GitHub Actions Failure
@rajdchak rajdchak had a problem deploying to integration-tests June 5, 2025 17:08 — with GitHub Actions Failure
Copy link
Collaborator

@ahmarsuhail ahmarsuhail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, mostly looks good, have some minor comments

@rajdchak rajdchak had a problem deploying to integration-tests June 6, 2025 15:29 — with GitHub Actions Failure
@rajdchak rajdchak had a problem deploying to integration-tests June 9, 2025 10:21 — with GitHub Actions Failure
@rajdchak rajdchak temporarily deployed to integration-tests June 9, 2025 10:53 — with GitHub Actions Inactive
@rajdchak rajdchak requested a review from ahmarsuhail June 9, 2025 11:26
Copy link
Collaborator

@fuatbasik fuatbasik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajdchak thanks a lot for your PR. I put few comments. I have 2 big one concerns we need to re-do.

1/ Please do not share SSE-C key value in plain-text. We can get them from an environment variable but we shouldn't get share them here like this. I think the way it should work is if ENV_VAR has SSE-C not set, it should skip these tests othewise execute them.

2/ Your change breaks integration tests data generation logic today, the tests will fail if a new developer just follows the DEVELOPMENT.MD. Please make sure you are not failing this steps when updating integration tests.

3/ Could you please add examples to README on how to enable SSE-C and how to use it.


static Stream<Arguments> encryptedParquetReads() {
List<S3Object> readEncryptedObjects = new ArrayList<>();
readEncryptedObjects.add(S3Object.RANDOM_SSEC_ENCRYPTED_PARQUET_1MB);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how these two objects are generated and uploaded to S3 bucket?

@awslabs awslabs deleted a comment from ahmarsuhail Jun 9, 2025
@rajdchak rajdchak temporarily deployed to integration-tests June 9, 2025 20:21 — with GitHub Actions Inactive
@rajdchak rajdchak temporarily deployed to integration-tests June 9, 2025 23:31 — with GitHub Actions Inactive
@rajdchak rajdchak had a problem deploying to integration-tests June 10, 2025 09:00 — with GitHub Actions Failure
@rajdchak rajdchak temporarily deployed to integration-tests June 10, 2025 10:00 — with GitHub Actions Inactive
Copy link
Collaborator

@ahmarsuhail ahmarsuhail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rajdchak , looks good to me overall.

Same question as @fuatbasik, I'm not familiar with how we do data generation for files we use for our ITests. have we updated that logic so it's possible to create these new files?

Copy link
Collaborator

@ahmarsuhail ahmarsuhail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

test file generation to be addressed in a follow up.

S3AsyncClient s3Client = this.getS3ExecutionContext().getS3Client();
S3SeekableInputStream stream = s3AALClientStreamReader.createReadStream(s3Object);
S3SeekableInputStream stream =
s3AALClientStreamReader.createReadStream(s3Object, OpenStreamInformation.DEFAULT);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could've saved yourself lots of time keeping createReadStream(s3Object) method around and inside call createReadStream(s3Object, OpenStreamInformation.DEFAULT).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nit btw but probably a better way to implement this pattern.

* @param checksum optional checksum, to update
* @param openStreamInformation contains the open stream information
*/
public abstract void readPattern(
Copy link
Collaborator

@fuatbasik fuatbasik Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we want to change existing method signature or add a new method?

@rajdchak rajdchak merged commit 1343077 into awslabs:main Jun 11, 2025
4 checks passed
ozkoca pushed a commit that referenced this pull request Jun 26, 2025
## Description of change
<!-- Thank you for submitting a pull request!-->
<!-- Please describe your contribution here. What and why? -->
<!-- Please ensure your commit messages follow these
[guidelines](https://chris.beams.io/posts/git-commit/). -->

Support passing of sse_c customer key to pass this to s3 for
encryption/decryption.

S3A currently has this customer key as Optional String
https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/auth/delegation/EncryptionSecretOperations.java#L41
while Iceberg has this key as a String
https://github.com/apache/iceberg/blob/f9cc62eb0d98e360b452a3ab8fdc6efdc4969f6e/aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java#L499.
So decided to accept this key as Optional String.

#### Relevant issues
<!-- Please add issue numbers. -->
<!-- Please also link them to this PR. -->

#### Does this contribution introduce any breaking changes to the
existing APIs or behaviors?
<!-- Please explain why this was necessary. -->

#### Does this contribution introduce any new public APIs or behaviors?
<!-- Please describe them and explain what scenarios they target.  -->

#### How was the contribution tested?
<!-- Please describe how this contribution was tested. -->

#### Does this contribution need a changelog entry?
- [ ] I have updated the CHANGELOG or README if appropriate

---

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license and I agree to the terms of
the [Developer Certificate of Origin
(DCO)](https://developercertificate.org/).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants